-
Notifications
You must be signed in to change notification settings - Fork 292
Merge feature/vcpu-runnable to feature/numa8 #6540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/numa8
Are you sure you want to change the base?
Conversation
Create a XAPI database with a number of VMs/VDIs/VBDs and measure how long update_allowed_operations takes. Can't really use 2400 VMs here yet, because even with 240 VMs takes ~15s to initialize the test. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 10145.1862 mjw/run│ 7412588.8431 mnw/run│ 53244625.3769 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53244625.376923 (confidence: 53612028.619469 to 53103082.519729); r² = Some 0.992851 } ``` Signed-off-by: Edwin Török <[email protected]>
``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 10096.0354 mjw/run│ 7412629.8723 mnw/run│ 53075833.0400 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53075833.040000 (confidence: 53469156.908088 to 52924201.003476); r² = Some 0.991453 } ``` Signed-off-by: Edwin Török <[email protected]>
O(log n) instead of O(n) complexity. Also filtering can be done more efficiently. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9874.6062 mjw/run│ 7412584.1277 mnw/run│ 51364914.0200 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 51364914.020000 (confidence: 51756944.767313 to 51194994.874696); r² = Some 0.990799 } ``` On this test ~2-3% improvement (potentially more on larger lists). Signed-off-by: Edwin Török <[email protected]>
Perform the cheaper check first, so that it will short-circuit the evaluation when false. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9884.5615 mjw/run│ 7412534.5215 mnw/run│ 53189355.8923 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 53189355.892308 (confidence: 53722938.915014 to 52908047.166446); r² = Some 0.992578 } ``` Signed-off-by: Edwin Török <[email protected]>
Activate old xapi_vdi.update_allowed_operations optimization: get_internal_records_where does a linear scan currently, so operating on N VDIs is O(N^2). Look at the VBD records directly, like before this 2013 commit which regressed it: 5097475 (We are going to optimize get_record separately so it doesn't go through serialization) For now only do this when run on the coordinator to avoid potentially large number of VBD round-trip database fetches. We'll need to optimize the 'get_internal_record_where' later to also speed up the pool case. ``` ╭─────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮ │name │ major-allocated │ minor-allocated │ monotonic-clock │ ├─────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤ │ update_allowed_operations/VDI │ 9205.8042 mjw/run│ 964577.0228 mnw/run│ 2868770.0725 ns/run│ ╰─────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯ update_allowed_operations/VDI (ns): { monotonic-clock per run = 2868770.072546 (confidence: 2947963.590731 to 2834338.835371); r² = Some 0.404284 } ``` Compared to the previous commit this is 18x faster. Signed-off-by: Edwin Török <[email protected]>
…onfigure Add new host object fields: - ssh_enabled - ssh_enabled_timeout - ssh_expiry - console_idle_timeout Add new host/pool API to enable to set a temporary enabled SSH service timeout - set_ssh_enabled_timeout Add new host/pool API to enable to set console timeout - set_console_idle_timeout Signed-off-by: Lunfan Zhang <[email protected]>
This PR introduces support for Dom0 SSH control, providing the following capabilities: Query the SSH status. Configure a temporary SSH enable timeout for a specific host or all hosts in the pool. Configure the console idle timeout for a specific host or all hosts in the pool. Changes New Host Object Fields: - `ssh_enabled`: Indicates whether SSH is enabled. - `ssh_enabled_timeout`: Specifies the timeout for temporary SSH enablement. - `ssh_expiry`: Tracks the expiration time for temporary SSH enablement. - `console_idle_timeout`: Configures the idle timeout for the console. New Host/Pool APIs (This PR only include the change of data model, the implementation of this API will be include in the next PR): - `set_ssh_enabled_timeout`: Allows setting a temporary timeout for enabling the SSH service. - `set_console_idle_timeout`: Allows configuring the console idle timeout.
Add "Changed" records for 2 APIs which were missed. Fix "param_release" for 3 added parameters. Signed-off-by: Gang Ji <[email protected]>
During pool join, create a new host obj in the remote pool coordinator DB with the same SSH settings as pool coordinator. Also configure SSH service locally before xapi restart which will persist after xapi restart. Signed-off-by: Gang Ji <[email protected]>
Signed-off-by: Steven Woods <[email protected]>
This consists of two parts, splitting the attach_and_activate into functionally equivalent attach and activate functions, and splitting the VBD_plug atomic itself's behaviour into two new atomics, VBD_attach and VBD_activate. If the new xenopsd_vbd_plug_unplug_legacy flag is true, the only difference will be that VBD_plug calls attach and activate sequentially instead of attach_and_activate. If xenopsd_vbd_plug_unplug_legacy is false, the VBD_attach and VBD_activate atomics will be used in place of VBD_plug in all places that it is used. This should still be functionally equivalent. The purpose of this change is to allow optimisations in this area as there are some situations where they do not need to be called at the same time. For example VBD_attach could be called outside of VM migrate downtime to reduce the overall downtime. Signed-off-by: Steven Woods <[email protected]>
This consists of two parts, splitting the unplug function into functionally equivalent deactivate and detach functions, and splitting the VBD_unplug atomic itself's behaviour into two new atomics: VBD_deactivate and VBD_detach If the xenopsd_vbd_plug_unplug_legacy flag is true, the only difference will be that VBD_unplug calls deactivate and detach sequentially instead of unplug If xenopsd_vbd_plug_unplug_legacy is false, the VBD_deactivate and VBD_detach atomics will be used in place of VBD_unplug in all places that it is used. This should still be functionally equivalent. The purpose of this change is to allow optimisations in this area as there are some situations where they do not need to be called at the same time. For example we could skip detaching on reboot and only deactivate and activate again reducing reboot time. Signed-off-by: Steven Woods <[email protected]>
Useless here, the local DB will be dropped soon as the joinner will switch to the remote DB of the new coordinator. And latest_synced_updates_applied will be set to `unknown in host.create in remote DB as default value. Signed-off-by: Gang Ji <[email protected]>
After being ejected from a pool, a new host obj will be created with default settings in DB. This commit configures SSH service in the ejected host to default state during pool eject. Signed-off-by: Gang Ji <[email protected]>
Signed-off-by: Gang Ji <[email protected]>
Signed-off-by: Gang Ji <[email protected]>
Implemented XAPI APIs: - `host.set_console_idle_timeout` - `pool.set_console_idle_timeout` These APIs allow XAPI to configure timeout for idle console sessions. Signed-off-by: Lunfan Zhang <[email protected]>
Implemented XAPI APIs: - `host.set_ssh_enabled_timeout` - `pool.set_ssh_enabled_timeout` These APIs allow XAPI to configure timeout for SSH service. `host.enable_ssh` now also supports enabling the SSH service with a ssh_enabled_timeout Signed-off-by: Lunfan Zhang <[email protected]>
Updated `records.ml` file to support `host-param-set/get/list` and `pool-param-set/get/list` for SSH-related fields. Signed-off-by: Lunfan Zhang <[email protected]>
Implemented XAPI APIs: - `set_ssh_enabled_timeout` - `set_console_idle_timeout` These APIs allow XAPI to configure timeouts for the SSH service and idle console sessions from both host and pool level. Updated `records.ml` to support `host-param-set/get/list` and `pool-param-set/get/list` for SSH-related fields.
The error set_console_idle_timeout_failed was added in feature branch while it is not used anywhere. The error used in set_console_idle_timeout now is invalid_value. Signed-off-by: Gang Ji <[email protected]>
Signed-off-by: Zeroday BYTE <[email protected]>
- Ensure host.enabled_ssh reflects the actual SSH service state on startup, in case it was manually changed by the user. - Reschedule the "disable SSH" job if: - host.ssh_enabled_timeout is set to a positive value, and - host.ssh_expiry is in the future. - Disable the SSH if: - host.ssh_enabled_timeout is set to a positive value, and - host.ssh_expiry is in the past. Signed-off-by: Lunfan Zhang <[email protected]>
merge master to feature branch
Unfortunately mirage-crypto has accumulated breaking changes: - Cstructs have been replaced with strings - The digestif library has replaced ad-hoc hash implementation A deprecation has happened as well: - RNG initialization has changed Because there are breaking changes, xs-opam changes need to be introduced at the same time: xapi-project/xs-opam#731 Only xapi is affected by the breaking builds, so no other toolstack repositories have incoming PRs. I've tested builds with Smoke and validation tests: SR 218740 This means that the merge will be done as such: - Both PRs are approved - First I will merge xs-opam will be emrged (with failing CI) - Then this PR will be merged with the merge train that runs tests before actually merging. - CI is rerun manually on xs-opam to make it green again After merging both xenserver's CI should create a successful build with both PR included
The /sys/fs/cgroup/systemd/cgroup.procs file is not always present, particularly in updated Linux systems with newer cgroup and SystemD. So fallback to root /sys/fs/cgroup/cgroup.procs. Also handle and report errors back to Ocaml. Although SystemD discourage handling cgroups without service configuration changes the root cgroup is a bit special as receiving processes from multiple sources, including the kernel.
Also fix the Makefile, so that 'make clean' also deletes the `.o.d` files. This avoids accidentally adding these files to git (although normally dune would invoke make in _build, only if you manually invoke it would it create these extra files): ``` A ocaml/forkexecd/helper/close_from.o A ocaml/forkexecd/helper/close_from.o.d A ocaml/forkexecd/helper/syslog.o A ocaml/forkexecd/helper/syslog.o.d A ocaml/forkexecd/helper/vfork_helper A ocaml/forkexecd/helper/vfork_helper.o A ocaml/forkexecd/helper/vfork_helper.o.d ``` Signed-off-by: Edwin Török <[email protected]>
Also fix the Makefile, so that 'make clean' also deletes the `.o.d` files. This avoids accidentally adding these files to git (although normally dune would invoke make in _build, only if you manually invoke it would it create these extra files): ``` A ocaml/forkexecd/helper/close_from.o A ocaml/forkexecd/helper/close_from.o.d A ocaml/forkexecd/helper/syslog.o A ocaml/forkexecd/helper/syslog.o.d A ocaml/forkexecd/helper/vfork_helper A ocaml/forkexecd/helper/vfork_helper.o A ocaml/forkexecd/helper/vfork_helper.o.d ```
Signed-off-by: Andrii Sultanov <[email protected]>
on every toolstack restart on hosts without Nvidia GPUs, xapi complains about a non-existent directory: xapi: [error||0 |dbsync (update_env) R:733fc2551767|xapi_vgpu_type] Failed to create NVidia compat config_file: Sys_error("/usr/share/nvidia/vgx: No such file or directory") Handle the directory's absence without propagating the error. Signed-off-by: Andrii Sultanov <[email protected]>
…_line Otherwise this quite frequently logs something like: ``` xcp-networkd: [error||22 |dbsync (update_env)|network_utils] Error in read one line of file: /sys/class/net/eth0/device/sriov_totalvfs, exception Unix.Unix_error(Unix.ENOENT, "open", "/sys/class/net/eth0/device/sriov_totalvfs") Backtrace ... ``` Signed-off-by: Andrii Sultanov <[email protected]>
non_debug_receive will dump an error after reading the last bits of the header, which is expected and handled by the caller appropriately: ``` xenopsd-xc: [error||67 |Async.VM.resume R:beac7be348f1|xenguesthelper] Memory F 6019464 KiB S 0 KiB T 8183 MiB <--- dumping error xenopsd-xc: [debug||67 |Async.VM.resume R:beac7be348f1|mig64] Finished emu-manager result processing <---- End_of_file expected and handled ``` Don't pollute the logs and instead just log the same info with 'debug' when the error is End_of_file. Signed-off-by: Andrii Sultanov <[email protected]>
It's an in-house joke that most of `xapi`'s errors are "expected" (either because they're handled by the caller after being logged or because they are frequent and benign) and knowing which ones are unusual and actually critical takes a lot of learning. Try to remove at least some of the most obvious ones from the toolstack startup/VM resume, either removing the logs completely or downgrading it to `debug` when appropriate. Manually tested in the appropriate scenarios.
…ions While one could potentially filter for this "value", I don't think it's that useful and adds noise to the completions, like here: ``` $ xe vm-list resident-on= 64c11cad-2c52-4dea-aea6-5fae0e720699 \<not\ in\ database\> 7f566729-0ee7-47c4-853d-2c5f3a195ad4 ``` Signed-off-by: Andrii Sultanov <[email protected]>
log needs to be moved one line below the first assignment into the "description" variable, otherwise it's always going to be an empty string Signed-off-by: Andrii Sultanov <[email protected]>
We used to rely on Bash's completion removing duplicate entries from the suggestions, but processing them in the first place is unnecessary (and will slow down completion since there's usually an 'xe' command run for each entry in the wordlist). Signed-off-by: Andrii Sultanov <[email protected]>
Provide a helpful description for some parameters of 'xe vm-list', compare before: ``` $ xe vm-list resident-on= 64c11cad-2c52-4dea-aea6-5fae0e720699 7f566729-0ee7-47c4-853d-2c5f3a195ad4 ``` with after: ``` $ xe vm-list resident-on= 64c11cad-2c52-4dea-aea6-5fae0e720699 - hpmc30 7f566729-0ee7-47c4-853d-2c5f3a195ad4 - hpmc29 ``` Signed-off-by: Andrii Sultanov <[email protected]>
No completion was provided before, and it's handled properly now: ``` $ xe vm-list suspend-SR-uuid= 08906228-cbf6-dad4-720d-e581df11510a - SR1 37b734f0-e594-0e48-2114-cd063241dd36 - SR2 ``` Signed-off-by: Andrii Sultanov <[email protected]>
from https://en.wikipedia.org/wiki/Passwd#Password_file uid_info as following format username:password:uid:gid:gecos:homedir:shell Regarding gecos, it is recommended as follows Typically, this is a set of comma-separated values including the user's full name and contact details. However, this information comes form AD and user may mis-configure it with `:`, which is used as seperator. In such case, the parse would failed. Enhance the parse function to support `:` in gecos, other fields does not likely contain it. Signed-off-by: Lin Liu <[email protected]>
from https://en.wikipedia.org/wiki/Passwd#Password_file uid_info as following format username:password:uid:gid:gecos:homedir:shell Regarding gecos, it is recommended as follows Typically, this is a set of comma-separated values including the user's full name and contact details. However, this information comes form AD and user may mis-configure it with `:`, which is used as seperator. In such case, the parse would failed. Enhance the parse function to support `:` in gecos, other fields does not likely contain it.
Instruments: - `VM.add`, - `VM.stat`, - `VM.exists`, - `VM.list`. Signed-off-by: Gabriel Buica <[email protected]>
Instruments `switch_rpc` according to OpenTelemetry standard on instrumenting rpc calls. - `server.address` is the name of the message queue. Intruments sending the message on a queue according to OpenTelemetry standard on instrumenting messaging. - `destination` is the name of the message queue. `Tracing.with_tracing` now accepts an optional argument to set the Span Kind. Signed-off-by: Gabriel Buica <[email protected]>
Instrument xenops vm non-atomic functions. Instruments: - `VM.add`, - `VM.stat`, - `VM.exists`, - `VM.list`. Instruments `switch_rpc` according to OpenTelemetry standard on instrumenting rpc calls. - `server.address` is the name of the message queue. Intruments sending the message on a queue according to OpenTelemetry standard on instrumenting messaging. - `destination` is the name of the message queue. `Tracing.with_tracing` now accepts an optional argument to set the Span Kind. 
Maintenance mode is entered by running Host.evacuate, followed by promoting a new pool coordinator and shutting down XAPI. We only export spans every 30s, so we may miss exporting the span for Host.evacuate. Ensure that we at least trigger the export when XAPI is about to shutdown. Do not wait for the export to finish, because this could take a long time (e.g. when exporting to a remote Jaeger instance). After this change I now see Host.evacuate properly in the exported trace. Signed-off-by: Edwin Török <[email protected]>
json_reformat cannot handle newline delimited json, it is easier if we have a command to reformat it ourselves. This can be useful when debugging why a trace is missing elements. Traces are stored as newline-delimited JSON in /var/log/dt/zipkinv2/json, however json_reformat cannot process them directly, and the lines can be very long and difficult to read otherwise. Signed-off-by: Edwin Török <[email protected]>
#6525) Soon after Host.evacuate XAPI could be restarted (e.g. on coordinator promotion). But we only export traces every 30s, so we lose the spans from the last 30s, including the toplevel Host.evacuate span (which although long running is only emitted on completion). After this change I'm now able to see Host.evacuate and all the migrate calls in the exported distributed trace.
The renaming is done to resolve one problem with rrd2csv: rrd2csv selected both RRD1("runnable") and RRD2("runnable_vcpus") when the RRD1 name is used: > rrd2csv AVERAGE:vm:<vm-uuid>:runnable > timestamp, AVERAGE:vm:<vm-uuid>:runnable, AVERAGE:vm:<vm-uuid>:runnable_vcpus This is because "runnable" is a prefix of "runnable_vcpus". After the renaming, with rrd2csv: * can select only RRD1 if we use: rrd2csv AVERAGE:vm:<vm-uuid>:runnable_any * can select only RRD2 if we use: rrd2csv AVERAGE:vm:<vm-uuid>:runnable_vcpus * can select both RRD1 and RRD2 if we use: rrd2csv AVERAGE:vm:<vm-uuid>:runnable And the renaming also makes it clearer as the 'runnable' metric is % of time that at least one vCPU of the domain is in the runnable state. Signed-off-by: Gang Ji <[email protected]>
Follow the fix here: #6493 Signed-off-by: Gang Ji <[email protected]>
I think a merge from master to the feature branch would be much better (I actually would prefer a rebase, seeing that this branch is only two patches right now) |
Thanks for the review. But |
Merge feature/vcpu-runnable as there were some little tweaks for RRD1: #6530